Audio Processing & Speech-to-Text Pipeline¶
Audio → Transcript → Summary → Keywords¶
Audio files often contain valuable information, but manually reviewing them can be time-consuming. This notebook automates the end-to-end pipeline — transcribing audio with Whisper, cleaning the raw transcript, generating concise summaries using FLAN-T5, and extracting key keywords for quick insights.
Workflow¶
Audio File (.mp3)
└── Whisper → Raw Transcript
└── Clean → FLAN-T5 Summarizer
├── Chunk Summaries → Final Summary
├── NLTK → Top Keywords
Import Libraries¶
import os
ffmpeg_path = r"C:\ffmpeg\ffmpeg-8.1-essentials_build\bin"
os.environ["PATH"] += os.pathsep + ffmpeg_path
import whisper
import re
import nltk
import string
from collections import Counter
from transformers import pipeline
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
Transcribe Audio with Whisper¶
Whisper is OpenAI's open-source speech recognition model. The base model balances speed and accuracy well for lecture audio.
model = whisper.load_model("base")
print("Whisper model loaded successfully.")
Whisper model loaded successfully.
audio_file = r"C:\Users\ADMIN\lecture_full.mp3" # <-- update this path
result = model.transcribe(audio_file)
transcript = result["text"]
C:\Users\ADMIN\anaconda3\envs\trading_env\Lib\site-packages\whisper\transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Clean the Transcript¶
Raw speech-to-text output contains filler words (um, uh, you know) and irregular spacing. This step strips them out using regex so the text is ready for summarization.
def clean_transcript(text):
text = re.sub(r"\b(um|uh|hmm|you know|like)\b", "", text, flags=re.IGNORECASE)
text = re.sub(r"\s+", " ", text)
text = re.sub(r"[^\w\s.,!?]", "", text)
return text.strip()
cleaned_text = clean_transcript(transcript)
Summarize with FLAN-T5 and Extract Top Keywords¶
FLAN-T5 (google/flan-t5-base) is an instruction-tuned seq2seq model well-suited for summarization tasks.
Using NLTK, stopwords and punctuation are filtered out, and the most frequent meaningful words are extracted. These keywords give a quick sense of the lecture's core topics.
from transformers import pipeline
summarizer = pipeline(
"text-generation",
model="google/flan-t5-base"
)
def chunk_text(text, chunk_size=1000):
return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
chunks = chunk_text(cleaned_text)
print(f"\nTotal chunks created: {len(chunks)}")
all_summaries = []
for i, chunk in enumerate(chunks):
print(f"Summarizing chunk {i+1}/{len(chunks)}...")
prompt = f"Summarize this lecture transcript clearly and concisely:\n\n{chunk}"
result = summarizer(
prompt,
max_length=200,
do_sample=False
)
all_summaries.append(result[0]["generated_text"])
final_summary = " ".join(all_summaries)
stop_words = set(stopwords.words("english"))
words = word_tokenize(cleaned_text.lower())
words = [
word for word in words
if word not in stop_words
and word not in string.punctuation
and len(word) > 2
]
keywords = Counter(words).most_common(10)
print("\n===== TOP KEYWORDS =====\n")
for word, freq in keywords:
print(f"{word}: {freq}")
===== TOP KEYWORDS ===== time: 23 series: 21 data: 19 analysis: 14 component: 10 use: 7 thats: 7 one: 6 components: 6 forecasting: 6
print(final_summary[:1000])
Summarize this lecture transcript clearly and concisely: My smartwatch tracks how much sleep I get each night. If Im feeling curious, I can look on my phone and see my nightly slumber plotted on a graph. It might look something this. And on the graph, on the Y axis, we have the hours of sleep. And then on the X axis, we have days. And this is an example of a time series. And what a time series is is data of the same entity, my sleep hours, collected at regular intervals, over days. And when we have time series, we can perform a time series analysis. And this is where we analyse the timestamp data to extract meaningful insights and predictions about the future. And while its super useful to forecast that I am going to probably get seven hours shut eye tonight based on the data, time series analysis plays a significant role in helping organisations drive better business decisions. So for example, using time series analysis, a retailer can use this functionality to predict future sales a
Summary¶
| Step | Tool | Output |
|---|---|---|
| Transcribe audio | openai-whisper |
Raw transcript string |
| Clean transcript | re (regex) |
Cleaned text |
| Summarize | FLAN-T5 via transformers |
Chunk-wise + final summary |
| Extract keywords | NLTK |
Top 10 frequent terms |